Martens, David

Topic Weight	Topic Terms
0.259	mobile telecommunications devices wireless application computing physical voice phones purchases ubiquitous applications conceptualization secure pervasive
0.201	differences analysis different similar study findings based significant highly groups popular samples comparison similarities non-is
0.198	data classification statistical regression mining models neural methods using analysis techniques performance predictive networks accuracy
0.166	theory theories theoretical paper new understanding work practical explain empirical contribution phenomenon literature second implications
0.154	explanations explanation bias use kbs biases facilities cognitive making judgment decisions likely decision important prior
0.142	office document documents retrieval automation word concept clustering text based automated created individual functions major
0.111	network networks social analysis ties structure p2p exchange externalities individual impact peer-to-peer structural growth centrality
0.101	set approach algorithm optimal used develop results use simulation experiments algorithms demonstrate proposed optimization present

■ Focal Researcher ■ Coauthors of Focal Researcher (1st degree) ■ Coauthors of Coauthors (2nd degree)

Note: click on a node to go to a researcher's profile page. Drag a node to reallocate. Number on the edge is the number of co-authorships.

Provost, Foster 2	Murray, Alan 1

analytical modeling 1	comprehensibility 1	Document classification 1	design science 1
instance level explanation 1	mobile computing 1	network analysis 1	text mining 1

Articles (2)

Authors:

Provost, Foster

Martens, David

Murray, Alan

Abstract:

This paper focuses on finding the same and similar users based on location-visitation data in a mobile environment. We propose a new design that uses consumer-location data from mobile devices (smartphones, smart pads, laptops, etc.) to build a Ògeosimilarity networkÓ among users. The geosimilarity network (GSN) could be used for a variety of analytics-driven applications, such as targeting advertisements to the same user on different devices or to users with similar tastes, and to improve online interactions by selecting users with similar tastes. The basic idea is that two devices are similar, and thereby connected in the GSN, when they share at least one visited location. They are more similar as they visit more shared locations and as the locations they share are visited by fewer people. This paper first introduces the main ideas and ties them to theory and related work. It next introduces a specific design for selecting entities with similar location distributions, the results of which are shown using real mobile location data across seven ad exchanges. We focus on two high-level questions: (1) Does geosimilarity allow us to find different entities corresponding to the same individual, for example, as seen through different bidding systems? And (2) do entities linked by similarities in local mobile behavior show similar interests, as measured by visits to particular publishers? The results show positive results for both. Specifically, for (1), even with the data sample's limited observability, 70%Ð80% of the time the same individual is connected to herself in the GSN. For (2), the GSN neighbors of visitors to a wide variety of publishers are substantially more likely also to visit those same publishers. Highly similar GSN neighbors show very substantial lift.

Explaining Data-Driven Document Classifications (MIS Quarterly, 2014)

Authors:

Martens, David

Provost, Foster

Abstract:

Many document classification applications require human understanding of the reasons for data-driven classification decisions by managers, client-facing employees, and the technical team. Predictive models treat documents as data to be classified, and document data are characterized by very high dimensionality, often with tens of thousands to millions of variables (words). Unfortunately, due to the high dimensionality, understanding the decisions made by document classifiers is very difficult. This paper begins by extending the most relevant prior theoretical model of explanations for intelligent systems to account for some missing elements. The main theoretical contribution is the definition of a new sort of explanation as a minimal set of words (terms, generally), such that removing all words within this set from the document changes the predicted class from the class of interest. We present an algorithm to find such explanations, as well as a framework to assess such an algorithm’s performance. We demonstrate the value of the new approach with a case study from a real-world document classification task: classifying web pages as containing objectionable content, with the goal of allowing advertisers to choose not to have their ads appear on those pages. A second empirical demonstration on news-story topic classification shows the explanations to be concise and document-specific, and to be capable of providing understanding of the exact reasons for the classification decisions, of the workings of the classification models, and of the business application itself. We also illustrate how explaining the classifications of documents can help to improve data quality and model performance.

Martens, David

Research Interests (8)

List of Topics (8)

Coauthor Network

List of Coauthors (2)

Article Keywords (8)

Articles (2)